REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  NO.  0704-0188 


The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments 
regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information,  including  suggesstions  for  reducing  this  burden,  to  Washington 
Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington  VA,  22202-4302. 
Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  any  oenalty  for  failing  to  comply  with  a  collection 
of  information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


5c.  PROGRAM  ELEMENT  NUMBER 
611102 


5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 


12.  DISTRIBUTION  AVAIL1BILITY  STATEMENT 
Approved  for  Public  Release;  Distribution  Unlimited 

13.  SUPPLEMENTARY  NOTES 
The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  contrued  as  an  official  Department 
of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 

14.  ABSTRACT 

The  project  had  the  following  three  overall  goals,  all  of  which  have  now  been  accomplished: 

(1)  To  characterize  the  brain  mechanisms  of  camouflage-breaking. 

(2)  To  characterize  the  brain  mechanisms  of  learning  to  break  camouflage,  or  camouflage  learning 

(3)  To  characterize  the  brain  mechanisms  recognizing  partially  occluded  camouflaged  objects. 

tu„  — ii  vu.~„ _ i _ — „ - +u — 

15.  SUBJECT  TERMS 

Visual  search.  Camouflage,  Functional  magnetic  resonance  imaging  (fMRI),  Perceptual  learning 

17.  LIMITATION  OF  1 15.  NUMBER 
ABSTRACT  OF  PAGES 

UU 

Standard  Fonn  298  (Rev  8/98) 
Prescribed  by  ANSI  Std.  Z39.18 


19a.  NAME  OF  RESPONSIBLE  PERSON 

Jay  Hegde _ 

19b.  TELEPHONE  NUMBER 
706-721-5129 


16.  SECURITY  CLASSIFICATION  OF: 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

UU 

UU 

UU 

7.  PERFORMING  ORGANIZATION  NAMES  AND  ADDRESSES 

Medical  College  of  Georgia  Research  Institi 
1120  15th  Street 

Augusta,  GA _ 30912  -4810 _ 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS 
(ES) 

U.S.  Army  Research  Office 
P.O.Box  12211 

Research  Triangle  Park,  NC  27709-2211 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 
ARO 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

57983-LS.13 


5d.  PROJECT  NUMBER 


3.  DATES  COVERED  (From  -  To) 
2-Mar-201 1  -  1 -May-20 15 

5a.  CONTRACT  NUMBER 

W911NF-1 1-1-0105 _ 

5b.  GRANT  NUMBER 


2.  REPORT  TYPE 

Final  Report 

4.  TITLE  AND  SUBTITLE 

Final  Report:  Neural  Mechanisms  of  Recognizing  Camouflaged 
Objects:  A  Human  fMRI  Study 


6.  AUTHORS 
Jay  Hegde 


1.  REPORT  DATE  (DD-MM-YYYY) 
30-07-2015 


Report  Title 

Final  Report:  Neural  Mechanisms  of  Recognizing  Camouflaged  Objects:  A  Human  fMRI  Study 

ABSTRACT 

The  project  had  the  following  three  overall  goals,  all  of  which  have  now  been  accomplished: 

(1)  To  characterize  the  brain  mechanisms  of  camouflage-breaking. 

(2)  To  characterize  the  brain  mechanisms  of  learning  to  break  camouflage,  or  camouflage  learning 

(3)  To  characterize  the  brain  mechanisms  recognizing  partially  occluded  camouflaged  objects. 

The  overall  finding  of  this  project  is  that  the  camouflage-breaking  and  camouflage  learning  both  engage  a  rather  unique  network  of  brain 
regions  that  is  distinct  from  the  networks  involved  in  other  closely  related  phenomena,  such  as  visual  search.  Moreover,  the  network 
actively  is  predictive  of  the  subject's  perceptual  reports  on  a  trial-to-trial  basis.  We  are  currently  in  the  process  of  writing  up  the  data  for 
publication.  We  expect  to  submit  the  first  of  these  manuscripts  for  review  in  a  series  of  these  manuscripts  by  September  1,  2015. 


Enter  List  of  papers  submitted  or  published  that  acknowledge  ARO  support  from  the  start  of 
the  project  to  the  date  of  this  printing.  List  the  papers,  including  journal  references,  in  the 
following  categories: 

(a)  Papers  published  in  peer-reviewed  journals  (N/A  for  none) 


Received  Paper 


08/25/2012  6.00  Evgeniy  Bart,  Jay  Hegde.  Invariant  Object  Recognition  Based  on  Extended  Fragments, 

Frontiers  in  Computational  Neuroscience,  (08  2012):  1.  doi: 

08/29/2013  9.00  Xin  Chen,  Jay  Hegde.  Learning  to  Break  Camouflage  by  Learningthe  Background, 

Psychological  Science,  (12  2012):  1395.  doi: 

08/29/2013  12.00  Evgeniy  Bart,  Jay  Hegde.  Exploitingtemporalcontinuityofviewstolearnvisualobjectinvariance, 

Frontiers  in  Computational  Neuroscience,  (03  2013):  1.  doi: 

08/29/2013  11.00  Evgeniy  Bart,  Jay  Hegde.  Invariant  recognition  of  visual  objects:  some  emergingcomputational  principles, 
Frontiers  in  Computational  Neuroscience,  (08  2012):  1.  doi: 

TOTAL:  4 


Number  of  Papers  published  in  peer-reviewed  journals: 


(b)  Papers  published  in  non-peer-reviewed  journals  (N/A  for  none) 


Received  Paper 


TOTAL: 


Number  of  Papers  published  in  non  peer-reviewed  journals: 


(c)  Presentations 

(1)  Hauffen  K.,  Van  Loozen,  D.  and  Hegde,  J.  Attentional  interference  in  normal  and  impaired  vision.  3rd  Annual  Retreat  of  the  James  and 
Jean  Culver  Vision  Discovery  Institute.  (201 1) 

(2)  Maestri,  M.,  Hauffen  K.  and  Hegde,  J.  A  novel  method  for  characterizing  cognitive  deficits  in  visual  perception.  3rd  Annual  Retreat  of 
the  James  and  Jean  Culver  Vision  Discovery  Institute.  (2011) 

(3)  Chen,  X.  and  Hegde,  J.  Learning  to  break  camouflage  by  learning  the  background.  3rd  Annual  Retreat  of  the  James  and  Jean  Culver 
Vision  Discovery  Institute.  (2011) 

(4)  Hegde,  J.  Frontiers  in  Neuroscience  lecture  series,  Emory  University  School  of  Medicine,  Atlanta,  GA.  (2011) 

(5)  Hegde,  J.  Grand  Rounds,  Department  of  Radiology,  Medical  College  of  Georgia,  Augusta,  GA.  (2011) 

(6)  Hegde,  J.  Society  for  Psychophysiological  Research  (SPR)  symposium  on  “Speed  channels:  Fast  dynamics  of  visual  discrimination  and 
attention”,  Portland,  OR.  (2011) 

(7)  Hegde,  J.  Distinguished  External  Seminar  Speaker  Series,  Center  of  Advanced  Brain  Imaging  (CAB1)  and  Departments  of  Psychology, 
Georgia  Institute  of  Technology  and  Georgia  State  University,  Atlanta,  GA.  (2012) 

(8)  Hegde,  J.  Research  Experience  for  Undergraduates  (REU)  lecture.  Department  of  Psychology,  University  of  South  Carolina-Aiken, 
Aiken,  SC.  (Simulcast  via  satellite  to  the  REU  audience  at  the  University  of  Wisconsin,  Stout,  WI.)  (2012) 

(9)  Hegde,  J.  Department  of  Neurobiology,  Harvard  Medical  School,  Boston,  MA.  (2012) 

(10)  Hegde,  J.  “Visual  Search:  A  Comprehensive  Treatment”  workshop,  Washington,  D.C.  (2013) 

(11)  Hegde,  J.  IGERT  (Integrative  Graduate  Education  and  Research  Traineeship)  program,  Rutgers  Center  for  Cognitive  Science 
(RuCCS),  Rutgers,  The  State  University  of  New  Jersey,  Piscataway,  NJ.  (2014) 

(12)  Hegde,  J.  “Basic  Behavioral  Research  on  Multisensory  Processing”  OppNet,  National  Institutes  of  Health,  Bethesda,  MD.  (2014) 

(13)  Hegde,  J.  “Vision  Science  Problems  in  Medical  Images”  workshop,  National  Cancer  Institute,  Basic  Biobehavioral  and  Psychological 
Sciences  Branch,  National  Institutes  of  Health,  Rockville,  MD.  (2014) 

(14)  Hegde,  J.  “Role  of  statistical  learning  in  radiological  diagnosis  of  cancer.”  Symposium  on  “New  Insights  into  Medical  Image 
Perception  from  the  Vision  Science  Perspective”  sponsored  by  the  National  Cancer  Institute  (NCI),  USA.  Medical  Imaging  Processing 
Symposium  (MIPS),  XVI.  Ghent,  Belgium.  (2015) 


Number  of  Presentations:  15.00 


Non  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Received  Paper 


TOTAL: 


Number  of  Non  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


Received  Paper 


TOTAL: 


Number  of  Peer-Reviewed  Conference  Proceeding  publications  (other  than  abstracts): 


(d)  Manuscripts 


Received  Paper 


08/29/2013  8.00  Matthew  Maestri,  Jeffrey  Odel,  Jay  Hegde.  Semantic  Descriptor  Ranking:  A  Quantitative  Method  for 
Evaluating  Qualitative  Verbal  Reports  of  Visual  Cognition  in  the  Laboratory  or  the  Clinic, 

Frontiers  of  Psychology  (08  2013) 

12/05/201 1  1 .00  X.  Chen,  J.  Hegde.  Role  of  Background  Learning  in  Learning  to  Break  Camouflage, 

(12  2011) 

12/05/201 1  4.00  Jay  Hegde,  Serena  Thompson,  Mark  Brady,  Dan  Kersten.  Object  Recognition  in  Clutter:  Cortical 
Responses  Depend  on  the  Type  of  Learning, 

Frontiers  in  Neuroscience  (12  201 1 ) 


12/05/201 1  5.00  Karin  Hauffen,  Eugene  Bart,  Mark  Brady,  Daniel  Kersten,  Jay  Hegde.  Creating  Objects  and  Object 
Categories  for  Studying  Perception  and  Perceptual  Learning, 

Journal  of  Visualized  Experiments  (12  2010) 


TOTAL:  4 


Number  of  Manuscripts: 


Books 


Received  Book 


TOTAL: 


Received 


TOTAL: 


(NA) 


Patents  Submitted 


[The  PI  has  been  trying,  with  the  help  of  the  Office  of  Innovation  Commercialization  of  the  Georgia  Regents  University,  to 
develop  the  aforementioned  inventions  to  a  patentable  stage.  So  far,  we  haven't  had  much  success,  mostly  because  of  lack  of 
seed  funding  to  support  in  this  development  process.] 

Patents  Awarded 


(NA) 


Awards 

•  2012  GRU  Emerging  Scientist  Award,  May  3,  2012. 

This  is  University-wide  annual  award  honors  one  researcher  per  year.  In  my  case,  the  award  recognized  my  "... 
multidisciplinary,  innovative  research  program  with  a  clinical  focus  to  study  the  neural  mechanisms  of  visual  function  am 
dysfunction,  and  to  develop  rehabilitative  treatments  for  visual  impairments."  [From  the  award  citation].  The  award 
included  a  $2000  cash  prize. 


Graduate  Students 


NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Names  of  Post  Doctorates 


NAME  PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 


Names  of  Faculty  Supported 


NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Names  of  Under  Graduate  students  supported 


NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Student  Metrics 

This  section  only  applies  to  graduating  undergraduates  supported  by  this  agreement  in  this  reporting  period 

The  number  of  undergraduates  funded  by  this  agreement  who  graduated  during  this  period: . 2.00 

The  number  of  undergraduates  funded  by  this  agreement  who  graduated  during  this  period  with  a  degree  in 

science,  mathematics,  engineering,  or  technology  fields: . 2.00 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  will  continue 

to  pursue  a  graduate  or  Ph.D.  degree  in  science,  mathematics,  engineering,  or  technology  fields: . 1  -00 

Number  of  graduating  undergraduates  who  achieved  a  3.5  GPA  to  4.0  (4.0  max  scale): . 2.00 

Number  of  graduating  undergraduates  funded  by  a  DoD  funded  Center  of  Excellence  grant  for 

Education,  Research  and  Engineering: . o.OO 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  intend  to  work 

for  the  Department  of  Defense . 0.00 

The  number  of  undergraduates  funded  by  your  agreement  who  graduated  during  this  period  and  will  receive 

scholarships  or  fellowships  for  further  studies  in  science,  mathematics,  engineering  or  technology  fields: . 1 .00 


Names  of  Personnel  receiving  masters  degrees 

NAME 

Total  Number: 


Names  of  personnel  receiving  PHDs 

NAME 

Total  Number: 


Names  of  other  research  staff 


NAME 

PERCENT  SUPPORTED 

FTE  Equivalent: 

Total  Number: 

Sub  Contractors  (DD882) 


Inventions  (DD882) 


Scientific  Progress 
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We  sincerely  appreciate  ARO  approving  the  no-cost  extension  of  this  project  that  allowed  us  to  collect  additional  confirmatory 
data  on  the  main  experiments,  and  carry  out  additional  control  experiments  to  shore  up  our  conclusions.  These  conclusions 
have  been  slowly  emerging  over  the  course  of  a  last  couple  of  years,  but  our  work  during  the  no-cost  extension  period  allowed 
us  fully  confirm  these  conclusions. 

The  main  outcome  of  the  the  three  aims  of  the  project  enumerated  in  the  Abstract  section  above  (camouflage-breaking  of 
whole  objects,  learning  to  break  the  camouflage  of  whole  objects,  and  camouflage-breaking  of  occluded  objects)  is  that  it  is  the 
particular  pattern  of  distributed  network  activity  that  distinguishes  camouflage-breaking  from  other  forms  of  visual  search  and 
visual  perception.  In  many  (although  not  all)  cases,  the  pattern  of  network  activity  is  diagnostic  of  a  particular  type  of 
camouflage-breaking  on  a  trial-to-trial  basis. 

The  aforementioned  distributed  neural  network  contains  29  distinct  brain  regions,  spread  over  all  four  lobes  of  the  cerebral 
hemisphere  (with  substantial  inter-hemispheric  asymmetry),  along  with  the  thalamus  and  the  basal  ganglia  (specifically  the 
subthalamic  nucleus). 

While  the  aforementioned  network  was  identified  based  on  the  fMRI  alone,  the  combined  EEG-fMRI  experiments  have 
essentially  confirmed  the  results  of  fMRI  alone.  Moreover,  EEG  has  helped  improve  the  temporal*  resolution  of  the  results 
from  fMRI  alone.  In  particular,  we  have  found  that  the  temporal  pattern  (not  to  mention  the  magnitude  of  the  BOLD  response)  of 
several  small  clusters  of  brain  regions  match  well  with  the  timing  of  the  subject's  camouflage-breaking  on  a  trial  trial-to-trial 
basis.  One  of  these  clusters,  in  the  superior  temporal  sulcus  (STS),  is  selectively  active  when  the  subjects  succeed  in  breaking 
camouflage  of  an  object  of  interest  (e.g.,  human  face).  Another  cluster  in  near  the  occipito-parietal  junction,  and  includes  the 
following  regions:  posterior  intrapietal  sulcus  (IPSp),  precuneus,  angular  gyrus  and  dorsal  precentral  gyrus  (PCGd). 

Intriguingly,  a  two  additional  clusters  of  ares,  one  near  the  occipital 

pole  and  other  in  the  medial  aspect  of  the  frontal  lobe,  are  active  when  the  subjects  correctly  decide  that  the  scene  in  question 
does  NOT  contain  a  camouflaged  object  of  interest.  This  is  potentially  highly  significant,  because  such  brain  regions  can,  in 
principle,  reduce  false  alarms  by  signaling  where  there  is  no  target  to  be  found  in  a  given  scene. 

We  emphasize  --  this  is  a  standard  caveat  applicable  to  all  neuroimaging  studies  that  use  fMRI  and/or  EEG,  and  not  specific  to 
our  study  --  that  our  results  do  NOT  rule  out  the  possibility  that  there  may  be  other  regions  in  the  brain  that  also  play 
comparable  roles,  but  whose  activity  is  below  the  resolution  of  either  EEG,  fMRI  or  both.  This  caveat  notwithstanding,  it  is 
noteworthy  that  the  activity  (and  the  timing)  of  the  response  of  each  of  these  regions  is,  by  itself,  significantly  diagnostic  of 
camouflage-breaking,  or  lack  thereof,  at  the  perceptual  level. 
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For  scientific,  clinical,  and  machine  learning  purposes  alike,  it  is  desirable  to  quantify  the 
verbal  reports  of  high-level  visual  percepts.  Methods  to  do  this  simply  do  not  exist  at 
present.  Here  we  propose  a  novel  methodological  principle  to  help  fill  this  gap,  and  provide 
empirical  evidence  designed  to  serve  as  the  initial  "proof"  of  this  principle.  In  the  proposed 
method,  subjects  view  images  of  real-world  scenes  and  describe,  in  their  own  words,  what 
they  saw.  The  verbal  description  is  independently  evaluated  by  several  evaluators.  Each 
evaluator  assigns  a  rank  score  to  the  subject's  description  of  each  visual  object  in  each 
image  using  a  novel  ranking  principle,  which  takes  advantage  of  the  well-known  fact  that 
semantic  descriptions  of  real  life  objects  and  scenes  can  usually  be  rank-ordered.  Thus, 
for  instance,  "animal,"  "dog,"  and  "retriever"  can  be  regarded  as  increasingly  finer-level, 
and  therefore  higher  ranking,  descriptions  of  a  given  object.  These  numeric  scores  can 
preserve  the  richness  of  the  original  verbal  description,  and  can  be  subsequently  evaluated 
using  conventional  statistical  procedures.  We  describe  an  exemplar  implementation  of  this 
method  and  empirical  data  that  show  its  feasibility.  With  appropriate  future  standardization 
and  validation,  this  novel  method  can  serve  as  an  important  tool  to  help  quantify  the 
subjective  experience  of  the  visual  world.  In  addition  to  being  a  novel,  potentially  powerful 
testing  tool,  our  method  also  represents,  to  our  knowledge,  the  only  available  method  for 
numerically  representing  verbal  accounts  of  real-world  experience.  Given  that  its  minimal 
requirements,  i.e.,  a  verbal  description  and  the  ground  truth  that  elicited  the  description, 
our  method  has  a  wide  variety  of  potential  real-world  applications. 


Keywords:  qualitative  research,  natural  language  processing,  semantic  processing,  visual  cognition,  neuropsycho¬ 
logical  tests 


INTRODUCTION 

In  real-world  situations,  our  perception  of  visual  scenes  tends  to 
be  complex  and  nuanced,  with  rich  semantic  content.  Capturing 
this  complexity  is  critical  not  only  for  the  study  and  treatment  of 
visual  dysfunction,  but  also  for  the  study  of  normal  visual  func¬ 
tion.  For  practical  reasons,  the  available  quantitative  tests  of  visual 
perception  tend  to  use  relatively  simple  visual  stimuli  and  tasks 
that  constrain  the  responses  of  the  test  subject  (e.g.,  contrast  sen¬ 
sitivity  test,  line  bisection  test,  star  cancellation  test),  so  that  the 
responses  can  be  precisely  measured  and  quantitatively  analyzed 
(Green  and  Swets,  1974;  Gescheider,  1997;  Lezak,  2012). 

The  importance  and  usefulness  of  traditional  quantitative  tests 
in  research  and  clinical  settings  is  indisputable.  But  it  is  also  clear 
that  quantitative  tests  of  visual  perception  have  a  major  drawback, 
in  that  they  fail  to  capture  the  complexity  of  visual  function  and 
dysfunction  in  real  life.  That  is,  the  complex,  qualitative  nature 
of  normal  high-level  visual  perception  under  real  life  conditions  is 
all  but  impossible  to  measure  using  the  available  quantitative  tests. 
Impairments  of  high-level  visual  perception  are  similarly,  hard  to 
measure. 


At  the  other  end  of  visual  testing  spectrum,  qualitative  tests 
of  visual  function  have  a  roughly  complementary  set  of  strengths 
and  weaknesses,  in  that  while  they  are  much  better  at  capturing 
the  nuances  of  high-level  vision  under  real-world  conditions,  the 
outcomes  of  these  tests  are  hard  to  quantify  (Miles  and  Huberman, 
1994;  Poreh,  2000;  Ogden-Epker  and  Cullum,  2001).  Imagine,  for 
instance,  a  clinical  provider  trying  to  quantify  the  visual  deficit  in 
a  patient  with  agnosia,  or  inability  to  recognize  objects.  A  typ¬ 
ical  test  is  to  show  the  patients  drawings  of  everyday  objects, 
such  as  a  pen,  mug  etc.,  and  ask  them  to  redraw  and  name  it. 
Patients  with  a  clear-cut  apperceptive  agnosia  fail  both  to  draw 
and  name  the  object,  whereas  patients  with  clear-cut  associa¬ 
tive  agnosia  generally  are  able  to  draw  the  object,  but  not  to 
name  it.  Even  when  the  outcome  of  the  test  is  clear-cut  as  this, 
it  is  hard  to  measure  the  quality  and  the  completeness  of  the 
drawings  and  naming.  Moreover,  the  actual  clinical  outcomes 
are  rarely  as  clear-cut,  with  most  patients  showing  symptoms 
that  cannot  be  neatly  pigeonholed  into  either  of  the  above  two 
extremes  (Atkinson  and  Adolphs,  2011;  Barton,  2011;  Mitchell, 
2011).  Furthermore,  the  outcomes  of  this  test  are  affected  by  an 


www.frontiersin.org 


March  2014  |  Volume  5  I  Article  160  I  1 


Maestri  etal. 


Semantic  descriptor  ranking 


array  of  complexities  of  agnosia.  Thus,  while  the  test  outcomes  are 
rich  in  qualitative  information,  it  is  hard  to  measure  this  infor¬ 
mation.  This  is  a  well-documented  shortcoming  of  qualitative 
tests  in  general  (Gainotti  etal.,  1985,  1989;  Milberg  etal.,  1996; 
Glozman,  1999;  Ogden-Epker  and  Cullum,  2001;  Pachalska  etal., 
2008). 

Quantifying  qualitative  reports  would  effectively  meld  the  best 
of  both  worlds,  by  combining  the  ability  of  the  qualitative  methods 
to  capture  the  richness  of  the  visual  experience  in  the  real-world 
with  the  scientific  rigor  of  the  quantitative  methods.  A  large 
number  of  such  methods  have  been  developed,  with  applica¬ 
tions  in  clinical  care,  educational  testing,  machine  learning  and 
scientific  research  (for  reviews,  see  Udupa,  1999;  Auerbach  and 
Silverstein,  2003;  Gustafson  and  McCandless,  2010;  Sauro  and 
Lewis,  2012;  Bazeley,  2013).  While  a  review  of  this  large  and 
diverse  literature  is  beyond  the  purview  of  the  present  report, 
two  aspects  of  the  quantification  process  are  particularly  worth 
noting.  First,  the  existing  methods  generally  require  that  the 
qualitative  report  be  formatted  or  structured  (e.g.,  question¬ 
naires),  so  as  to  streamline  the  quantification  process.  That  is, 
the  underlying  qualitative  reports  are  generally  not  open-ended. 
Second,  to  our  knowledge,  no  methods  exist  in  clinical,  psy¬ 
chophysical  or  machine  learning  literature  for  creating  a  numeric 
representation  of  verbal  reports.  This  latter  issue  is  particularly 
relevant  when  dealing  with  real-world  visual  percepts,  which  have 
a  rich  semantic  content  (Miles  and  Huberman,  1994;  Glozman, 
1999;  Poreh,  2000;  Zipf- Williams  etal.,  2000;  Joy  etal.,  2001; 
Ogden-Epker  and  Cullum,  2001). 

In  this  report,  we  propose  a  novel  methodological  principle 
that  will  help  address  both  of  the  aforementioned  shortcomings 
of  the  currently  available  approaches,  and  is  well  suited  to  com¬ 
plement  (albeit  not  replace)  the  rich  array  of  available  methods. 
Our  method,  which  we  will  refer  to  as  semantic  descriptor  rank¬ 
ing  (SDR),  allows  quantification  of  open-ended,  verbal  reports  of 
visual  scenes.  We  illustrate  its  implementation  using  perceptual 
reports  of  complex  real-world  scenes  by  healthy  subjects.  As  noted 


above,  the  present  report  only  aims  to  provide  a  proof  of  concept 
of  the  proposed  method,  i.e.,  that  the  proposed  method  is  feasi¬ 
ble.  Our  implementation  will  also  help  highlight  issues  involved  in 
the  future  development  and  refinement  of  the  proposed  method, 
including  its  standardization  and  validation  (Glozman,  1999; 
Ogden-Epker  and  Cullum,  2001;  Chiappelli,  2008;  Salkind,  2010). 

MATERIALS  AND  METHODS 

PARTICIPANTS 

Fourteen  different  volunteer  adults  (6  females)  participated  in 
one  or  both  of  the  two  experiments  that  constituted  this  study. 
Subjects  were  19  to  31  years  of  age  (median  age,  24  years). 
In  either  experiment,  some  participants  participated  as  subjects 
who  viewed  the  stimuli  and  reported  their  percepts,  and  oth¬ 
ers  participated  as  evaluators,  who  scored  the  subjects’  reported 
percepts.  No  one  participated  both  as  a  subject  and  as  an 
evaluator.  That  is,  no  one  who  participated  in  either  experi¬ 
ment  as  a  subject  also  participated  in  either  experiment  as  an 
evaluator,  or  vice  versa.  All  subjects  had  normal  or  corrected- 
to-normal  vision,  with  no  known  neurological  or  psychiatric 
disorders. 

Experiment  1  consisted  of  six  subjects  and  two  evaluators, 
and  Experiment  2  consisted  of  eight  subjects  and  two  evaluators. 
All  participants  gave  informed  consent  prior  to  participating  in 
the  study.  All  protocols  used  in  the  experiment  were  approved 
in  advance  by  the  Human  Assurance  Committee  at  the  Georgia 
Regents  University,  where  this  study  was  carried  out. 

VISUAL  STIMULATION 

In  Experiment  1,  50  different  real-world  photographs  from  the 
Corel  Stock  Photo  Library  (Corel  Corporation,  Ottawa,  ON, 
Canada)  were  used  as  visual  stimuli  in  this  study  (see,  e.g., 
Figures  1  and  3).  Subjects  sat  comfortably  approximately  30  cm  in 
front  of  a  computer  monitor  in  a  normally  lit  room  (ambient  lumi¬ 
nance,  14.6  cd/m2).  Each  trial  started  when  the  subject  indicated 
readiness  by  pressing  a  button.  The  visual  stimulus  was  presented 


Step  1  Subject  views  the  image  and  describes  it  freely  in  his/her  own  words 

I 


Sub-step  2a.  Evaluator  identifies  image  descriptors,  e  g.,  “Dog'1  (Baseline  score:  10) 

I 


Sub-step  2b.  Evaluator  compares  subject  s  image  descriptors  with  his/her  own  image  descriptors 


Same  hierarchical 
level  as  the 
evaluator's  image 
descriptors  e  g.,  "Dog" 

Score:  10 


Higher  level  than 
evaluator's  image 
descriptors 
e.g..  "Retriever" 

Score:  >10 


Lower  level  than 
evaluator's  image 
descriptors 
e  g..  “Animal- 

Score:  <10 


Subject  fails  to  report 
image  element 

Score:  0 
(Miss  Rule) 


Subject  misidentifies 
image  element 
e  g..  "Border  Collie" 

Deduct  >=1  from  1 0 
(False  Alarm  Rule) 


Step  3.  Experimenter  carries  out  post-hoc  statistical  analyses  of  evaluators'  scores 


FIGURE  1  |  Workflow  of  SDR.  The  three  main  steps  of  SDR,  which  involve 
obtaining,  scoring  and  analyzing  the  subject's  reports  respectively,  are  shown. 
Note  that  the  subject,  the  evaluator,  and  the  experimenter  plays  the  most 
prominent  role  in  Steps  1 , 2,  and  3,  respectively.  The  two  sub-steps  of  Step  2 


in  which  the  evaluator  scores  the  subject's  reported  percept  are  illustrated 
here  using  a  hypothetical  exemplar  image  (not  shown)  in  which  one  of  the 
image  elements  is  a  dog.  Sub-steps  2a  and  2b  are  repeated  for  each  image 
element  in  each  image  (not  shown).  See  text  for  details. 
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Trial  start  Stimulus  Mask  Subject’s  Report 


17  ms  or  50  ms  100  ms  (unlimited  duration) 


FIGURE  2  | Trial  paradigm  used  in  our  implementation  of  Step  1.  Each  trial 
started  when  the  subject  indicated  readiness. The  visual  stimulus 
(a  real-world  image)  was  presented  for  17  ms  or  50  ms,  depending  on  the 
trial.  To  minimize  the  contribution  of  stimulus  repetition  on  the  subject's 


reports,  each  given  image  was  presented  for  the  longer  stimulus  first,  as 
described  in  Materials  and  Methods.  After  the  100  ms  mask,  the  subject  was 
allowed  unlimited  time  to  describe,  in  his/her  own  words,  what  he/she 
perceived  in  the  stimulus.  The  figure  is  not  drawn  to  scale. 


for  50  ms  or  17  ms,  depending  on  the  condition,  followed  by  a 
random  dot  mask  (Figure  1).  These  two  stimulus  durations  cor¬ 
respond  to  1  or  3  frame  durations  of  the  computer  monitor  at  a 
screen  refresh  rate  of  60  Hz. 

Trials  were  presented  in  a  pseudo-random  order.  To  minimize 
the  contribution  of  stimulus  repetition  on  the  subject’s  reports 
(Ochsner  etal.,  1994;  Maxfield,  1997;  Gauthier,  2000;  Henson, 
2003;  Holcomb  and  Grainger,  2007;  Kristjansson  and  Campana, 
2010),  we  ensured  that  the  50  ms  viewing  of  a  given  stimulus 


preceded  its  17  ms  viewing  within  the  pseudo-random  sequence 
of  trials. 

The  stimulus  subtended  9°  x  6°  (for  landscape  format  pic¬ 
tures;  the  reverse  for  portrait  format  pictures),  and  had  an  average 
luminance  of  30.2  cd/m2,  and  was  presented  against  a  uni¬ 
form  gray  screen  of  the  same  mean  luminance.  The  mask  had 
the  same  average  luminance  and  subtended  9°  x  9°.  Follow¬ 
ing  the  mask,  the  subject  had  unlimited  time  to  orally  describe, 
in  his/her  own  words  and  with  no  prompting  or  feedback, 


A  Stimulus  used  In  Step  1 


B  Percepts  of  the  stimulus  reported 
by  SubjectOO  in  Step  1 

Stimulus  duration  50  ms 


Stimulus  duration  17  ms 


D  Stimulus  used  in  Step  1 


E  Percepts  of  the  stimulus  reported 
by  Subject02  in  Step  1 

Stimulus  duration  50  ms 


Stimulus  duration  17  ms 


C  Scoring  of  the  reports  of  SubjectOO  by  EvaluatorOO  in  Step  2 


F  Sconng  of  the  reports  of  Subject02  by  Evaluator  01  Step  2 
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FIGURE  3  |  Two  instances  of  implementation  of  SDR  steps  1  and  2. 

Panels  A  through  C  show  the  scoring  of  one  set  of  verbal  reports,  and 
panels  D  through  F  show  the  scoring  of  a  second,  independent  set.  (A) 
Stimulus.  Subject  viewed  the  stimulus  for  50  ms  and  17  ms  respectively 
in  randomly  interleaved  trials,  so  that  subject  reports  were  paired  across 
the  experiment.  Note  even  though  each  subject  viewed  the  same 
stimulus  twice,  the  longer  viewing  duration  always  preceded  the  shorter 
viewing  duration,  so  as  to  counteract  priming  effects,  if  any  (see  text  for 


details).  (B)  Percepts  of  the  stimulus  in  panel  A  as  reported  by  a 
subject  for  each  of  the  two  stimulus  durations.  (C)  Scoring  of  the 
subject's  reports  in  panel  B  by  the  evaluator.  Note  that,  although  the 
subject's  descriptions  of  the  building  were  spread  over  multiple 
sentences,  the  evaluator  grouped  them  together  into  a  single  descriptor, 
in  accordance  with  the  scoring  rules.  Columns  corresponding  to  various 
image  elements  are  highlighted  in  different  colors  solely  to  enhance 
visibility.  (D-F)  Scoring  of  a  different  pair  of  reports. 


www.frontiersin.org 


March  2014  |  Volume  5  |  Article  160  |  3 


Maestri  etal. 


Semantic  descriptor  ranking 


what  he/she  saw  in  the  visual  stimulus.  The  description  was 
audio-recorded. 

Experiment  2  was  identical  to  Experiment  1  except  that  it  used 
a  different,  non-overlapping  set  of  50  images,  and  a  different,  but 
partially  overlapping  set  of  subjects  and  evaluators. 

RATIONALE  BEHIND  SDR 

Semantic  descriptor  ranking  takes  advantage  of  the  fact  that  our 
semantic  understanding,  and  therefore  the  reported  percept,  of 
visual  objects  tends  to  have  a  naturally  hierarchical  structure:  A 
large  number  of  previous  studies  have  shown  that  our  under¬ 
standing  of  real-world  objects  generally  (although  not  always,  see 
Discussion)  follows  a  hierarchical  pattern  of  categories  (Rosch, 
1973;  Hanson  and  Hanson,  2005;  Hegde,  2008).  For  instance, 
a  particular  pet  dog  named  “Spike”  can  be  thought  of,  in  an 
order  of  increasingly  finer  categorization,  as  an  object,  an  ani¬ 
mal,  a  mammal,  a  dog,  a  retriever,  a  Golden  retriever,  and 
finally  as  a  particular  dog  named  Spike.  This  hierarchical  orga¬ 
nization  lends  itself  to  ranking,  so  that  the  above  descriptors  can 
be  rank-ordered,  in  increasing  order  of  specificity,  as  object  <  ani¬ 
mal  <  mammal  <  dog  <  retriever  <  Golden  retriever  <  Spike. 
Similarly,  “brown  dog”  can  be  reasonably  considered  a  more  spe¬ 
cific,  and  therefore  higher  ranking,  description  than  “dog”.  These 
ranks  can  be  analyzed  using  the  established  rank-based  statistical 
methods. 

Given  that  ranking  semantic  “tags,”  or  descriptors,  is  central 
to  our  method,  we  refer  to  it  as  SDR.  We  use  the  term  “semantic 
descriptor”  to  mean  a  word  or  phrase  (i.e.,  a  verbal  “tag”)  that 
describes  a  given  object,  to  distinguish  it  from  the  term  “[image] 
descriptor”  commonly  used  in  machine  vision,  which  generally 
refers  to  various  lower-level  properties  of  the  image,  such  as  color, 
texture,  or  local  shape  (Lowe,  1999;  Mikolajczyk  and  Schmid,  2001; 
Belongie  etal.,  2002;  Manjunath  etal.,  2002;  Pollefeys  and  Gool, 
2002;  Lazebnik  et  al.,  2005;  Snavely  et  ah,  2006). 

IMPLEMENTATIONS  OF  SDR:  VARIATIONS  OF  A  THEME 

A  typical  implementation  of  SDR  would  consist  of  the  following 
three  steps,  in  order  (see  Figure  1;  also  see  below):  (1)  Subjects 
freely  view  pictures  of  real-world  scenes  and  describe  in  their  own 
words  what  they  see.  (2)  A  set  of  independent  evaluators  examine 
each  subject’s  reports  and  rank  the  descriptors  according  to  how 
specific  the  descriptors  are.  Since  each  descriptor  will  be  assigned 
a  rank  score,  the  report  as  a  whole  will  typically  consist  of  mul¬ 
tiple  rank  scores.  Collectively,  these  rank  scores  are  a  numeric 
representation  of  the  verbal  report.  (3)  The  experimenters  ana¬ 
lyze  the  numeric  representations  using  conventional  statistical 
methods. 

Note  that  a  large  number  of  variations  of  the  above  theme  are 
possible;  one  can  customize  SDR  for  a  given  purpose  by  appropri¬ 
ately  varying  one  or  more  of  the  above  three  steps.  Indeed,  the  only 
two  crucial  requirements  of  SDR  are  that  (a)  the  reports  be  verbal 
(i.e.,  spoken  or  written),  (b)  the  image  or  scene  underlying  the 
report  be  available  for  independent  evaluation  (i.e.,  the  evaluator 
be  able  to  see  what  the  subject  is  seeing). 

With  these  minimum  requirements  met,  one  can  create  a 
numeric  representation  of  a  given  perceptual  report  of  interest 


(“query  representation”)  and  appropriately  compare  it  to  a  refer¬ 
ence  of  some  sort.  Note  that  this  reference  can  be  arrived  at  by  any 
of  a  large  number  of  possible  principled  methods.  For  instance, 
the  reference  representation  can  be  obtained  using  the  same  sub¬ 
ject  viewing  the  same  image  under  a  different  viewing  condition 
(e.g.,  different  stimulus  duration,  see  below).  For  a  hemineglect 
patient,  for  instance,  the  query  and  reference  representations  can 
be  obtained  using  stimulus  presentations  in  the  affected  and  spared 
hemisphere,  respectively.  Each  of  these  instances  makes  for  a 
two-sample,  within-subject  paired  design,  where  the  query-  vs. 
reference  representations  constitute  the  two  samples.  Alternatively, 
one  can  use  a  one-sample  design,  where  the  query  representation 
from  one  subject  can  be  compared  against  an  existing  reference 
sample  obtained  from,  say,  a  large  number  of  other  subjects.  Note 
that  the  query  and/or  reference  representations  can,  in  principle, 
be  obtained  using  machine  vision  algorithms,  rather  than  human 
subjects  (see  Discussion). 

RESULTS 

AN  ILLUSTRATIVE  IMPLEMENTATION  AND  PROOF  OF  PRINCIPLE  OF 
SDR 

We  will  illustrate  the  use  of  SDR  using  a  two-sample,  within- 
subject  paired  design  that  compared  the  verbal  reports  of  each 
given  subject  on  the  same  set  of  images  using  two  different  stimu¬ 
lus  durations.  This  design  exploits  the  previously  known  fact  that, 
in  general,  longer  viewing  of  visual  stimuli  elicits  finer-grained 
perception  than  briefer  viewing  (Sugase  etal.,  1999;  Liu  etal., 
2002;  Grill-Spector  and  Kanwisher,  2005;  Hegde,  2008;  also  see 
Discussion). 

We  carried  out  two  experiments.  Experiment  1  compared  the 
reports  elicited  by  the  viewing  of  the  same  set  of  real-world  scenes 
for  long  vs.  brief  durations  (50  ms  vs.  17  ms,  respectively;  see 
Materials  and  Methods  for  details).  It  tested  the  hypothesis,  using 
SDR,  that  the  responses  elicited  by  the  50  ms  viewing  collectively 
will  have  higher  rankings  than  the  responses  elicited  by  the  17  ms 
viewing. 

STEP  1:  OBTAINING  QUALITATIVE  REPORTS  FROM  THE  SUBJECTS 

Subjects  viewed  natural  images  one  per  trial,  presented  for  either 
50  ms  or  17  ms,  depending  on  the  trial  (Figure  2;  see  Methods  for 
details).  After  a  brief  mask,  the  subjects  were  allowed  unlimited 
time  to  describe,  in  their  own  words  and  ad  libitum,  what  they  saw 
in  the  stimulus.  The  subjects’  reports  were  audio-recorded. 

Each  subject  viewed  each  image  twice,  first  for  the  longer 
stimulus  duration  and  then  for  the  shorter  duration,  in  blocks 
of  randomly  interleaved  trials  (see  Methods).  The  rationale  for 
always  first  viewing  the  image  for  the  longer  duration  was  to 
minimize  the  contributions  of  priming/exposure  effects,  where 
a  previously  viewed  stimulus  tends  to  elicit  better  recognition 
during  subsequent  viewing  (Ochsner  etal.,  1994;  Maxfield,  1997; 
Gauthier,  2000;  Henson,  2003;  Holcomb  and  Grainger,  2007; 
Kristjansson  and  Campana,  2010).  Note  that  this  meant  that 
the  priming/exposure  effects  would  actually  tend  to  counter¬ 
act,  i.e.,  reduce,  the  expected  increase  in  rankings  upon  longer 
stimulus  viewing.  Thus,  our  method  would  have  to  find  duration- 
dependent  effects,  if  any,  over  and  above  the  counteracting  effects 
of  priming. 
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STEP  2.  INDEPENDENT  RANKING  OF  THE  SUBJECT'S  REPORTS  BY 
EVALUATORS 

This  step  essentially  consisted  of  ranking,  separately  by  each  of  the 
evaluators,  of  the  descriptors  used  by  subjects  in  their  oral  report. 
This  is  the  crucial  step  in  SDR,  in  which  qualitative  reports  of  the 
subjects  are  converted  into  quantitative  measures. 

Before  the  evaluations  began,  the  evaluators  received  extensive 
training  in  the  relevant  procedures.  In  addition  to  the  routine 
scoring  procedures  (outlined  in  Figures  1  and  3),  we  devised  a 
set  of  somewhat  arbitrary,  but  principled,  evaluation  rules  for 
handling  special  cases  (some  of  which  are  shown  in  Table  1)  in 
order  to  help  ensure  that  these  cases  were  handled  as  consistently 
as  possible.  Note  that  the  evaluation  rules  can  be  customized  for 
each  given  application  of  SDR.  Note  also  that  it  is  possible,  in 
principle,  to  write  computer  programs  to  automate  the  evaluation 
process. 

Each  subject’s  reports  were  scored  by  multiple  evaluators  inde¬ 
pendently  of  each  other  and  of  the  subject.  The  scoring  process 
consisted  of  two  sub-steps  (Figures  1  and  3C,F):  Sub-step  2a  con¬ 
sisted  of  the  evaluator’s  own  offline  analysis  of  each  image  prior 
to  evaluating  the  subject’s  reports,  in  which  each  evaluator  viewed 
each  given  image  ad  libitum,  and  wrote  down  any  number  of 
semantic  descriptors  as  he/she  thought  were  needed  to  capture 
what  was  in  the  image.  Each  descriptor  was  assigned  an  arbi¬ 
trary  baseline  value  of  10.  It  is  important  to  emphasize  that  the 
absolute  value  of  the  baseline  score  (or  of  other  scores  for  that 
matter,  see  below)  is  unimportant;  any  value  that  allows  suf¬ 
ficient  room  for  deductions  and  bonuses  (i.e.,  sufficient  spread 
from  the  baseline)  will  suffice.  That  is,  what  matters  in  our  par¬ 
ticular  implementation  of  SDR  are  the  relative  scores,  rather 
than  the  absolute  scores,  since  our  implementation  ultimately 
uses  rank  statistics  (see  Discussion  for  other  implementation 
options).  For  the  same  reason,  the  absolute  hierarchical  level  of 
the  descriptor  (“Dog”  vs.  “Golden  retriever”)  that  a  given  evalu¬ 
ator  comes  up  with  [which,  among  other  things,  depends  on  the 
expertise  of  the  evaluator  (Rosch,  1973;  Palmeri  and  Gauthier, 
2004);  also  see  Discussion]  does  not  matter  in  the  present  context 
either. 

In  Sub-step  2b,  the  evaluators  listened  to  the  audio  recording  of 
the  subject’s  perceptual  report  of  the  same  stimulus,  and  scored  the 
subject’s  descriptions  of  the  image  relative  to  the  evaluator’s  image 
descriptors  from  Sub-step  2a  according  to  a  set  of  pre-specified  rules 
(see  Table  1).  If  the  subject’s  descriptor  was  deemed  to  be  essen¬ 
tially  the  same  as  the  corresponding  descriptor  of  the  evaluator 
(e.g.,  “dog”),  the  subject’s  report  for  the  given  image  descriptor 
was  also  assigned  the  baseline  value. 

If  the  subject’s  description  was  more  specific  (“Golden 
retriever”)  than  that  of  the  evaluator,  the  subject’s  description 
was  assigned  a  correspondingly  higher  score.  The  exact  decre¬ 
ment  or  increment  of  the  score  was  up  to  the  evaluator,  but  he/she 
was  required  to  be  consistent  about  it  across  subjects.  For  instance, 
“Golden  retriever”  can  be  reasonably  considered  one,  or  two,  ranks 
higher  in  terms  of  the  level  of  categorization  than  “Dog”,  depend¬ 
ing  on  whether  the  evaluator  recognizes  an  intermediate  category 
of  “Retriever.”  Similarly,  if  the  subject’s  descriptor  was  less  specific 
(“animal”)  than  the  evaluator’s  corresponding  descriptor  (“dog”), 
the  subject’s  report  was  given  a  correspondingly  lower  score. 


Table  1  |  Selected  special  case  rules. 
Rules 


1.  Objects  (i.e.,  nouns,  such  as  "dog")  are  primary  descriptors,  while 
adjectives/modifiers  such  as  colors  (e.g.,  "black")  are  secondary 
descriptors.  Descriptions  with  correct  primary  and  secondary 
descriptors  should  receive  higher  ranking  than  descriptions  with  a 
correct  primary  descriptor  but  without  a  secondary  descriptor. 

2.  If  the  primary  descriptor  is  correct,  but  the  secondary  descriptor  is 
wrong,  award  the  appropriate  points  for  the  correct  primary 
descriptor,  and  simply  ignore  the  incorrect  secondary  descriptor, 
but  do  not  deduct  points  for  it. 

For  example,  if  the  stimulus  contains  a  red  car,  and  the  subject's 
report  describes  a  red  car,  then  award  plus  a  bonus  point  for  the 
correct  secondary  identifier.  But  if  the  subject  reports  a  blue  car, 
simply  take  the  bonus  points  away,  but  do  not  deduct  from  the 
point  you  were  going  to  award  for  the  correct  primary  descriptor. 
The  reason  for  this  rule  is  to  ensure  that,  in  the  above  case  for 
instance,  "blue  car"  does  not  receive  fewer  points  than  simply 
"car." 

3.  Miss  Rule.  If  an  object  is  present  in  the  image,  but  it  is  not  reported, 
then  award  a  score  of  0  for  that  descriptor. 

4.  False  Alarm  Rule.  If  an  object  that  is  not  present  in  the  image  is 
reported,  then  assess  a  penalty  of  —1.  For  example,  if  the  subject 
reports  a  car  when,  in  fact,  there  is  no  car  in  the  picture,  then  the 
score  should  be  reduced  by  1.  Also  assess  a  penalty  if  an  object  is 
reported  as  something  else  entirely.  For  example,  the  image 
contains  a  tree  and  the  subject  reports  a  building  instead  of  a  tree 
then  a  penalty  of  -1  should  be  assessed. 

5.  If  there  is  more  than  one  object  of  the  same  kind  (e.g.,  more  than 
one  person)  award  a  bonus  of  +1  for  each  additional  person 
recognized.  Flowever,  there  is  no  penalty  if  the  subject  does  not 
report  all  the  persons  in  the  image.  The  following  are  just  two 
examples  and  could  apply  for  any  type  of  objects. 

Example  1 :  An  image  has  three  dogs.  The  subject  reports  three 
dogs.  The  score  should  be  10  +  1  +  1  =  12.  Default  score  of  10  for 
one  recognized  and  1  point  added  per  dog. 

Example  2:  An  image  has  three  dogs.  The  subject  reports  one  dog. 
Then  it  is  still  rewarded  the  standard  10  for  recognition  of  a  dog,  and 
no  penalty  for  not  identifying  the  rest. 

6.  In  those  cases  where  the  secondary  descriptor  is  redundant  with 
the  primary  descriptor  (e.g.,  "blue  sky,"  "green  grass")  do  not 
award  extra  points  for  the  secondary  descriptor.  When  the 
secondary  descriptor  is  not  redundant  (e.g.,  the  stimulus  contains 
brown  grass),  award  bonus  points  for  correct  secondary  descriptor 
(in  this  case,  "brown"). 
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If  the  subject  failed  to  report  a  given  object  altogether,  the 
given  image  descriptor  was  assigned  a  value  of  0  (“Miss  Rule”  in 
Figure  1;  also  see  Table  1).  If  the  subject  misidentified  an  image 
element  (e.g.,  when  a  Golden  retriever  was  identified  as  “Border 
Collie”),  the  subject  was  penalized  one  or  more  points  according 
to  the  hierarchical  level  of  the  reported  identifier  (“False  Alarm 
Rule”).  That  is,  the  subject  was  awarded  the  appropriate  score 
for  having  recognized  that  it  was  a  dog,  and  was  then  penalized 
1  point  for  misidentifying  the  breed.  Note  that  while  this  is  a 
somewhat  arbitrary  rule,  it  is  also  principled,  and  has  considerable 
precedence  (Green  and  Swets,  1974;  Geissler  etal.,  1992).  Note,  in 
any  event,  that  the  drawbacks  of  our  implementation,  such  as  they 
may  be,  are  not  the  drawbacks  of  SDR  per  se.  Investigators  can  take 
advantage  of  the  basic  SDR  principle  but  nonetheless  devise  their 
own  set  of  implementation  rules. 

An  actual  image  used  in  Experiment  1  is  shown  in  Figure  3A. 
The  reports  of  one  subject  after  viewing  it  for  50  ms  and  17  ms 
are  shown  in  3B.  The  corresponding  scoring  of  a  typical  evalu¬ 
ator  scored  the  two  reports  using  our  scoring  method  is  shown 
in  Figure  3C.  Note  that,  as  expected,  the  report  for  longer  dura¬ 
tion  elicited  ratings  equal  to  or  better  than,  the  baseline  scores  for 
all  image  identifiers,  whereas  with  the  shorter  duration,  the  sub¬ 
ject  missed  a  few  image  identifiers.  Thus,  the  scoring  method  did 
reveal  that  longer  viewing  also  produced  a  finer-grained  percept 
of  the  image.  Figures  3D-F  illustrate  the  reports  of  another  sub¬ 
ject  of  a  different  image  and  the  corresponding  scores  assigned  by 
a  different  evaluator.  The  scores  were  lower  for  the  shorter  image 
duration,  because  the  subject  misidentified  the  snowy  background 
in  this  case. 

Finally,  to  help  account  for  individual  differences  across  sub¬ 
jects  and  evaluators,  we  repeated  the  first  three  steps  independently 
across  multiple  subjects  and  evaluators  (Reynolds  and  Willson, 
1985;  Miles  and  Huberman,  1994;  Milberg  etal.,  1996;  Poreh, 
2000;  Ogden-Epker  and  Cullum,  2001).  Some  of  the  representative 
results  are  shown  in  Figure  4A  in  a  color-coded  format.  In  general, 
subjects’  reports  for  the  longer  stimulus  duration  elicited  larger 
scores  than  their  reports  of  the  same  image  for  the  shorter  viewing 
duration,  as  denoted  by  the  fact  that  there  were  a  greater  number  of 
descriptors  and  more  of  the  descriptors  had  higher-than-baseline 
values  (i.e.,  greener  cells)  in  Figure  4A. 

STEP  3:  Post  hoc  STATISTICAL  ANALYSES  OF  THE  EVALUATORS' 

SCORES 

We  tested  each  numerical  score  produced  by  the  evaluators  using 
the  conventional  paired  two-sample  Mann- Whitney  test.  As  noted 
above,  based  on  previous  studies,  we  expect  a  priori  that  longer 
stimulus  durations  produce  finer-grained  percepts  (Liu  et  al.,  2002; 
Grill-Spector  and  Kanwisher,  2005;  Xu  etal.,  2005;  but  see  Mack 
et  al.,  2008,  2009).  For  each  of  the  three  subjects  and  either  evalu¬ 
ator,  50  ms  viewing  of  the  images  did  elicit  significantly  finer-level 
categorization  (one-tailed  paired  Mann-Whitney,  p  <  0.05  in  all 
cases). 

To  test  the  reproducibility  of  the  results,  we  re-tested  one 
subject  after  a  9-day  delay,  so  as  to  minimize  priming  or  other 
memory-related  effects  from  the  first  session.  The  scores  of 
the  two  sessions  are  shown  in  Figures  4B,C.  The  scores  were 
statistically  indistinguishable  between  the  two  sessions  (2-way 


ANOVA,  session  x  stimulus  duration;  p  <  0.05  for  stimulus 
duration  factor  and  p  >  0.05  for  session  and  interaction  factors). 

The  scores  were  also  consistent  between  the  two  evaluators 
across  all  datasets  (Cronbach’s  alpha  test,  a  =  0.87;  data  not 
shown).  Thus,  the  scores  did  not  significantly  depend  on  the 
particular  evaluator  used. 

A  principled  validation  method  for  the  scoring  algorithm  is 
to  test  whether  the  scores  can  predict  the  corresponding  stimulus 
condition.  The  underlying  rationale  is  that  if  the  numerical  scores 
of  the  evaluators  reliably  reflect  the  reports,  and  the  reports  in  turn 
are  a  reliable  reflection  of  the  stimulus  duration,  then  it  should  be 
possible  to  predict  the  stimulus  duration  based  on  the  correspond¬ 
ing  scores.  We  found  this  to  be  true  for  all  six  data  sets  (Spearman 
rank  correlation,  r  >  0.67,  df  =  49  for  all  six  data  sets;  data  not 
shown),  indicating  that  the  scores  reliably  reflect  the  underlying 
stimulus  conditions. 

We  obtained  qualitatively  similar  results  in  Experiment  2,  which 
used  a  different  set  of  images,  subjects  and  evaluators,  compared 
to  those  used  in  Experiment  1  (data  not  shown).  Together,  the 
results  of  the  two  experiments  indicate  that  our  above  results  are 
not  idiosyncratic  to  the  stimuli,  subject  and  evaluators  used.  Our 
results  also  indicate  that  SDR  is  a  sensitive  technique  that  can 
detect  relatively  subtle  differences  in  visual  perception,  given  the 
fact  that  the  differences  in  stimulus  durations  was  relatively  small 
(17  ms  vs.  50  ms). 

DISCUSSION 

STRENGTHS  AND  POTENTIAL  APPLICATIONS  OF  SDR 

The  main  novelty  of  SDR  is  that  it  is  a  method  for  numerically  rep¬ 
resenting  verbal  descriptions  of  the  ground  truth  (in  the  present 
case,  the  visual  images).  To  our  knowledge,  methods  to  do  this 
simply  do  not  exist  at  present.  Note  that,  reduced  to  its  essentials, 
SDR  requires  only  that  the  ground  truth  that  the  verbal  account 
describes  be  available  for  independent  evaluation.  Given  this  sim¬ 
plicity  of  its  requirements,  SDR  is  potentially  applicable  to  wide 
variety  of  potential  real-world  applications  in  which  qualitative, 
verbal  descriptions  of  real-world  experiences  need  to  be  quantified 
(see  below). 

Our  experimental  results  demonstrate  that  SDR  is  useful  for 
quantifying  qualitative  reports  of  visual  scenes.  Although  we  illus¬ 
trate  the  method  by  varying  stimulus  duration,  we  expect  that 
the  method  should  be  applicable  to  any  case  in  which  subjective 
experience,  visual  or  otherwise,  are  verbally  reported,  by  nor¬ 
mal  subjects  or  patients,  as  long  as  the  ground  truth  that  elicited 
the  experience  can  be  independently  evaluated.  It  also  stands  to 
reason  that  second-hand  reports  of  percepts,  such  as  a  clinical 
provider’s  verbal  observations  of  the  patient’s  behavior,  can  be 
similarly,  quantified  using  the  same  underlying  principles. 

Three  main  strengths  of  SDR  are  particularly  worth  noting. 
First,  it  places  very  few  constraints  on  the  patients  (or  subjects), 
in  that  it  allows  patients  to  view  the  stimuli  freely  and  naturally, 
and  describe  their  percepts  in  their  own  words.  This  allows  the 
researcher,  clinician  or  the  machine  learning  algorithm  to  evaluate 
the  subject/patient  in  a  setting  that  is  natural  and  minimally  stress¬ 
ful.  In  this  sense,  our  method  is  different  from  other  methods  of 
quantifying  qualitative  data,  which  generally  require  streamlining 
or  formatting  of  the  qualitative  data,  e.g.,  using  questionnaires 
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FIGURE  4  |  Comparison  of  subjects'  reports  for  the  short  (17  ms)  and 
long  (50  ms)  stimulus  durations.  Subjects  viewed  images  for  either 
stimulus  duration,  and  reported  their  percepts  using  the  paradigm 
illustrated  in  Figure  2.  Subjects'  reports  were  scored  by  an  evaluator  as 
illustrated  in  Figures  1  and  3,  and  the  resulting  scores  are  shown  in  this 
figure  in  color-coded  fashion.  Panels  A  though  C  show  data  from  three 
different,  representative  data  sets,  each  obtained  using  the  same  set  of 
images.  Each  row  in  each  panel  shows  the  numeric  representation  of  a 
single  verbal  report.  Rows  are  matched  across  panels  A-C,  so  that,  for 
instance,  row  7  in  each  panel  denotes  reports  of  the  same  image  by 
different  subjects  and/or  during  different  sessions.  Each  column  denotes  a 
different  descriptor.  Note  that  the  columns  are  not  necessarily  matched 
across  panels  A-C  (although  they're  exactly  matched  within  each  panel), 
because  the  subjects  did  not  necessarily  describe  the  same  set  of  image 


elements  even  though  the  underlying  images  were  the  same.  The  order  of 
rows  or  columns  has  no  particular  meaning,  so  that  the  only  meaningful 
comparison  is  between  paired  cells  within  each  data  set.  All  data  are 
rendered  on  a  black  background  according  to  the  color  scale  at  top  right. 
White  cells  denote  image  elements  for  which  the  subject  used  the  same 
descriptor  as  the  baseline  descriptor  set  by  the  given  evaluator.  Green  and 
red  hues  denote  image  descriptors  that  were,  respectively,  more  specific 
or  less  specific  than  the  evaluator's  baseline  descriptors.  Gray  cells  denote 
the  descriptors  the  subject  used  during  one  viewing,  but  omitted  during 
the  other.  (A)  Reports  of  Subject  00  as  scored  by  Evaluator  02.  Since  this 
particular  subject  reported  <9  descriptors  for  any  given  image,  there  are 
nine  columns  in  this  data  set.  (B)  and  (C)  denote  the  reports  of  Subject 
01  in  two  successive,  duplicate  sessions  9  days  apart,  as  scored  by  same 
evaluator  (Evaluator  01).  See  text  for  details. 


or  forms  (for  reviews,  see  Udupa,  1999;  Auerbach  and  Silverstein, 
2003;  Gustafson  and  McCandless,  2010;  Sauro  and  Lewis,  2012; 
Bazeley,  2013).  Second,  SDR  can,  in  principle,  preserve  much  of 
the  richness  of  the  verbal  reports,  depending  on  the  rules  and  algo¬ 
rithms  used  for  evaluating  the  reports.  Note  also  that  the  scores 
need  not  necessarily  be  integer  rank  scores;  it  should  be  possible, 
in  principle,  to  develop  algorithms  for  assigning  fractional  scores 
that  treat  the  underlying  descriptors  as  values  of  a  continuous 
variable,  rather  than  of  a  discrete  or  categorical  variable.  Third,  as 
noted  above,  this  method  is  likely  to  be  flexible  and  versatile,  with 
a  broad  array  of  potential  applications,  given  that  its  requirements 
are  ultimately  minimal,  viz.,  a  verbal  description  and  the  ground 
truth  that  elicited  the  description.  For  this  reason,  SDR  should  be 
applicable  to  a  wide  variety  of  stimuli  (including  drawings,  pho¬ 
tographs,  or  videos,  and  non-visual  stimuli  such  as  sounds  and 


haptic  objects),  the  aspect  of  the  stimulus  perceived  (such  as  some 
affective  aspect  of  the  stimulus,  the  texture  of  an  object,  the  origin 
of  a  sound,  etc.).  Thus,  a  pollster  using  focus  groups  to  evaluate 
the  impact  a  political  or  commercial  advertisement  can  use  the 
same  set  of  SDR  principles  as  an  ophthalmologist  or  a  neurolo¬ 
gist  evaluating  a  patient’s  deficits  in  one  or  more  of  the  senses,  an 
educator  testing  students  or  a  recruiter  testing  the  aptitude  of  the 
applicants  to  comprehend  complex  real-world  situations. 

It  is  worth  noting  that,  as  alluded  to  in  the  Results  section, 
machine  learning  methods  can  be  devised  to  carry  out  the 
aforementioned  steps  2  (independent  evaluation  of  the  subjects’ 
reports)  and  3  ( post  hoc  statistical  analyses  of  the  evaluators’ 
reports)  of  SDR.  This  would  make  the  given  implementation  of 
SDR  more  objective  by  removing  the  contribution  of  the  evalu¬ 
ators’  subjectivity  from  the  process.  In  addition,  our  method  has 


www.frontiersin.org 


March  2014  |  Volume  5  I  Article  160  I  7 


Maestri  etal. 


Semantic  descriptor  ranking 


potential  applications  to  machine  learning  itself,  because  it  allows 
machines  to  process  language  using  a  numerical  representation 
thereof.  To  our  knowledge,  methods  to  do  this  do  not  currently 
exist  either. 

SOME  IMPORTANT  CAVEATS  AND  POTENTIAL  FUTURE 
IMPROVEMENTS 

There  are  four  caveats  that  are  particularly  important  to  note.  First, 
as  noted  earlier,  our  results  only  provide  a  “proof  of  concept,”  and 
do  not,  by  themselves  fully  validate  this  method.  In  order  to  val¬ 
idate  SDR,  one  needs  to  show  that  SDR  independently  produces 
essentially  the  same  results  as  that  obtained  by  a  different,  estab¬ 
lished  method  (for  reviews,  see  Willig  and  Stainton-Rogers,  2008; 
Denzin  and  Lincoln,  201 1;  Lezak,  2012).  SDR  also  needs  to  be  stan¬ 
dardized  for  each  intended  purpose.  For  instance,  the  conditions 
under  which  it  yields  the  most  reliable  results  for  a  given  purpose 
(e.g.,  evaluating  hemianopsia  patients)  remain  to  be  delineated. 
The  scoring  rules  also  need  to  be  further  developed  and  standard¬ 
ized.  Standardizing  and  cross-validating  SDR  will  also  help  further 
delineate  its  strengths,  weaknesses,  potential  applications,  and  lim¬ 
itations.  Note  that  the  fact  that  SDR  needs  to  be  developed  and 
refined  further  before  it  can  be  used  in  real-world  applications  does 
not  by  itself  undermine  the  value  of  the  underlying  concept.  After 
all,  test  development  is  necessarily  an  iterative  process;  any  testing 
method  has  to  undergo  the  aforementioned  development  process 
(  Brennan  et  al.,  2006;  Downing  and  Haladyna,  2006;  Phelps,  2007; 
Gregory,  2010). 

Second,  SDR  is  meant  not  to  supplant,  but  rather  to  sup¬ 
plement,  the  existing  qualitative  and  quantitative  methods.  This 
caveat  is  particularly  important  in  view  of  the  fact  that  this  method 
is  yet  to  be  tested  extensively,  and  its  strengths  and  weaknesses 
empirically  documented.  Specifically,  it  should  be  noted  that  SDR 
is  by  no  means  a  universally  applicable  method  for  quantifying 
for  qualitative  reports,  especially  in  cases  where  the  underlying 
descriptors  may  not  be  reliably  rank-ordered,  e.g.,  in  educational 
research  (Hartas,  2010;  Torrance,  2010;  Haghi  and  Rocci,  2013). 
Moreover,  as  alluded  to  in  the  Results  section,  a  verbal  report, 
however  indirect,  is  a  prerequisite  of  SDR. 

Third,  the  numerical  scores  of  the  evaluators  are  meant  to  be 
used  in  statistical  tests  that  compare  the  relative  values,  not  the 
absolute  values,  of  the  scores,  such  as  rank-order  or  rank-sum 
tests.  This  is  because  our  tests  do  not  correct  for  the  criterion  level 
of  the  individual  evaluator,  e.g.,  whether  a  given  evaluator  may 
tend  to  score  the  reports  “generously.”  Using  the  relative  values  of 
the  scores  tends  to  correct  for  this,  although  only  to  the  extent  that 
a  given  evaluator’s  criterion  remains  unchanged  across  the  relevant 
dataset.  To  correct  for  these  criterion  effects,  and  to  obviate  the 
need  for  rank-based  statistics,  one  can  average  over  a  large  number 
of  randomly  chosen  evaluators.  For  instance,  one  can  create  a  large 
database  of  reports  and  scores  for  each  given  set  of  stimuli  that  can 
be  used  as  a  reference  distribution  to  correct  for  any  deviations 
from  the  norm.  Note,  incidentally,  that  having  such  a  database  also 
obviates  the  need  to  carry  out  paired  statistics  or  even  two-sample 
statistics,  because  the  researcher  can  always  compare  a  given  single 
sample,  e.g.,  a  given  subject’s  reports  for  a  stimulus  duration  of 
17  ms,  against  a  standard  reference  distribution  of  reports  for  that 
duration. 


Finally,  SDR  is  based  on  the  hierarchical  nature  of  object  per¬ 
cepts,  and  therefore  is  not  currently  suited  to  evaluate  percepts 
that  are  not  hierarchical.  This  is  especially  true  of  affective  per¬ 
cepts.  However,  by  using  a  reference  distribution  as  outlined  above, 
one  can  extend  our  method  to  the  assessment  of  non-hierarchical 
percepts. 

RELATION  TO  PREVIOUS  WORK 

What  is  most  novel  about  our  approach  is  that  it  exploits  the  hier¬ 
archical  organization  of  natural  objects  to  generate  an  arbitrarily 
rich  numeric  representation  of  the  reported  visual  percept.  To  the 
best  of  our  knowledge,  methods  to  do  this  simply  do  not  exist 
at  present.  But  other  aspects  of  our  method,  including  the  use 
of  independent  evaluators,  have  been  previously  used  in  other 
studies  of  visual  dysfunction  as  well  as  normal  visual  function 
(Miles  and  Huberman,  1994;  Glozman,  1999;  Poreh,  2000;  Zipf- 
Williams  etal.,  2000;  Joy  etal.,  2001;  Ogden-Epker  and  Cullum, 
2001;  Fei-Fei  etal.,  2007).  Having  independent  evaluators  inde¬ 
pendently  scoring  the  subjects  reports  is  effective,  because  it  tends 
to  average  out  random  variance  among  evaluators  while  leaving 
intact  non-random  variance  -  that  is,  using  independent  eval¬ 
uators  helps  achieve  a  measure  of  objectivity  by  way  of  shared 
subjectivity  (Hegde,  2008). 

In  the  ultimate  analysis,  the  utility  of  SDR  is  that  it  provides 
a  novel  approach  to  grappling  with  the  breathtaking  complexity 
and  richness  of  our  subjective  visual  experience.  In  this  regard,  it 
is  of  great  potential  utility  in  research,  clinical  and  machine  vision 
contexts  alike. 
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